A System for Compound Noun Multiword Expression Extraction for Hindi

ثبت نشده
چکیده

Compound noun multiword expressions are important for many NLP applications like machine translation and information retrieval. This paper describes a system for Hindi compound noun multiword expressions (MWE) extraction from a given corpus. We identify major categories of compound noun MWEs, based on linguistic and psycholinguistic principles. Our extraction methods use various statistical co-occurrence measures to exploit the statistical idiosyncrasy of MWEs. We make use of various lexical cues from the corpus to enhance our methods. We also address the extraction of reduplicative expressions using lexical, semantic and phonetic knowledge. We have also built an evaluation resource of compound noun MWEs for Hindi. Our methods give a recall of 80% and precision of 23% at

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Identification of Bengali Noun-Noun Compounds Using Random Forest

This paper presents a supervised machine learning approach that uses a machine learning algorithm called Random Forest for recognition of Bengali noun-noun compounds as multiword expression (MWE) from Bengali corpus. Our proposed approach to MWE recognition has two steps: (1) extraction of candidate multi-word expressions using Chunk information and various heuristic rules and (2) training the ...

متن کامل

Constituency Parser for Hindi Noun Sequences and Role of Bracketing in Translation of English Compound Nouns into Hindi

Complex noun sequences in Hindi can be formed by the sequences of nouns and genitives. In Hindi, the genitive marker is “kā”, and its allomorphic variations are “ke” and “kī”. When two or more nouns occur without any intervening post-positions, it is known as compound noun. Following are some examples of complex noun sequences: (1) “jilā cunāva adhikārī” (district election officer), (2) “tila k...

متن کامل

A Machine Learning Approach for the Identification of Bengali Noun-Noun Compound Multiword Expressions

This paper presents a machine learning approach for identification of Bengali multiword expressions (MWE) which are bigram nominal compounds. Our proposed approach has two steps: (1) candidate extraction using chunk information and various heuristic rules and (2) training the machine learning algorithm called Random Forest to classify the candidates into two groups: bigram nominal compound MWE ...

متن کامل

Extraction and Analysis of English Noun-Noun Compounds with Chinese-English Parallel Corpora

Noun-noun compound is a common type of multiword expression in English. It causes problems in natural language processing as many other kinds of MWEs. In this paper, we extract noun-noun compounds using their POS tags. Then the extracted nounnoun compounds are aligned to their Chinese translations using word alignment method. The statistical analysis of the alignments shows that English noun-no...

متن کامل

Semantics-based Multiword Expression Extraction

This paper describes a fully unsupervised and automated method for large-scale extraction of multiword expressions (MWEs) from large corpora. The method aims at capturing the non-compositionality of MWEs; the intuition is that a noun within a MWE cannot easily be replaced by a semantically similar noun. To implement this intuition, a noun clustering is automatically extracted (using distributio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008